Access to Unlabeled Data can Speed up Prediction Time

نویسندگان

  • Ruth Urner
  • Shai Shalev-Shwartz
  • Shai Ben-David
چکیده

Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many unlabeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how unlabeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a Method of Scheduling the Access to Data Relating to V-I in VANET

Today, vehicular Ad-hoc Network between automobiles has been accepted by the public, and many people are interested in exchanging information with other vehicles as well as with roadside equipment while driving. When a number of vehicles want to access to the data stored in RSU ,the priority of servicing comes into face .In this case, some  methods, such as D*S, D*S/N,  have been presented ...

متن کامل

Presenting a Method of Scheduling the Access to Data Relating to V-I in VANET

Today, vehicular Ad-hoc Network between automobiles has been accepted by the public, and many people are interested in exchanging information with other vehicles as well as with roadside equipment while driving. When a number of vehicles want to access to the data stored in RSU ,the priority of servicing comes into face .In this case, some  methods, such as D*S, D*S/N,  have been presented ...

متن کامل

Simulate Congestion Prediction in a Wireless Network Using the LSTM Deep Learning Model

Achieved wireless networks since its beginning the prevalent wide due to the increasing wireless devices represented by smart phones and laptop, and the proliferation of networks coincides with the high speed and ease of use of the Internet and enjoy the delivery of various data such as video clips and games. Here's the show the congestion problem arises and represent   aim of the research is t...

متن کامل

خوشه‌بندی خودکار داده‌ها با بهره‌گیری از الگوریتم رقابت استعماری بهبودیافته

Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...

متن کامل

Real-Time Building Information Modeling (BIM) Synchronization Using Radio Frequency Identification Technology and Cloud Computing System

The online observation of a construction site and processes bears significant advantage to all business sector. BIM is the combination of a 3D model of the project and a project-planning program which improves the project planning model by up to 6D (Adding Time, Cost and Material Information dimensions to the model). RFID technology is an appropriate information synchronization tool between the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011